AAAI.2021 - Application Domains

Total: 88

#1 The Undergraduate Games Corpus: A Dataset for Machine Perception of Interactive Media [PDF] [Copy] [Kimi]

Authors: Barrett R. Anderson ; Adam M. Smith

Machine perception research primarily focuses on processing static inputs (e.g. images and texts). We are interested in machine perception of interactive media (such as games, apps, and complex web applications) where interactive audience choices have long-term implications for the audience experience. While there is ample research on AI methods for the task of playing games (often just one game at a time), this work is difficult to apply to new and in-development games or to use for non-playing tasks such as similarity-based retrieval or authoring assistance. In response, we contribute a corpus of 755 games and structured metadata, spread across several platforms (Twine, Bitsy, Construct, and Godot), with full source and assets available and appropriately licensed for use and redistribution in research. Because these games were sourced from student projects in an undergraduate game development program, they reference timely themes in their content and represent a variety of levels of design polish rather than only representing past commercial successes. This corpus could accelerate research in understanding interactive media while anchoring that work in freshly-developed games intended as legitimate human experiences (rather than lab-created AI testbeds). We validate the utility of this corpus by setting up the novel task of predicting tags relevant to the player experience from the game source code, showing that representations that better exploit the structure of the media outperform a text-only baseline.

#2 Efficient Poverty Mapping from High Resolution Remote Sensing Images [PDF] [Copy] [Kimi]

Authors: Kumar Ayush ; Burak Uzkent ; Kumar Tanmay ; Marshall Burke ; David Lobell ; Stefano Ermon

The combination of high-resolution satellite imagery and machine learning have proven useful in many sustainability-related tasks, including poverty prediction, infrastructure measurement, and forest monitoring. However, the accuracy afforded by high-resolution imagery comes at a cost, as such imagery is extremely expensive to purchase at scale. This creates a substantial hurdle to the efficient scaling and widespread adoption of high-resolution-based approaches. To reduce acquisition costs while maintaining accuracy, we propose a reinforcement learning approach in which free low-resolution imagery is used to dynamically identify where to acquire costly high-resolution images, prior to performing a deep learning task on the high-resolution images. We apply this approach to the task of poverty prediction in Uganda, building on an earlier approach that used object detection to count objects and use these counts to predict poverty. Our approach exceeds previous performance benchmarks on this task while using 80% fewer high-resolution images, and could be useful in many domains that require high-resolution imagery.

#3 Optimal Kidney Exchange with Immunosuppressants [PDF] [Copy] [Kimi]

Authors: Haris Aziz ; Ágnes Cseh ; John P. Dickerson ; Duncan C. McElfresh

Algorithms for exchange of kidneys is one of the key successful applications in market design, artificial intelligence, and operations research. Potent immunosuppressant drugs suppress the body's ability to reject a transplanted organ up to the point that a transplant across blood- or tissue-type incompatibility becomes possible. In contrast to the standard kidney exchange problem, we consider a setting that also involves the decision about which recipients receive from the limited supply of immunosuppressants that make them compatible with originally incompatible kidneys. We firstly present a general computational framework to model this problem. Our main contribution is a range of efficient algorithms that provide flexibility in terms of meeting meaningful objectives. Motivated by the current reality of kidney exchanges using sophisticated mathematical-programming-based clearing algorithms, we then present a general but scalable approach to optimal clearing with immunosuppression; we validate our approach on realistic data from a large fielded exchange.

#4 TreeCaps: Tree-Based Capsule Networks for Source Code Processing [PDF] [Copy] [Kimi]

Authors: Nghi D. Q. Bui ; Yijun Yu ; Lingxiao Jiang

Recently program learning techniques have been proposed to process source code based on syntactical structures (e.g., abstract syntax trees) and/or semantic information (e.g., dependency graphs). While graphs may be better than trees at capturing code semantics, constructing the graphs from code inputs through the semantic analysis of multiple viewpoints can lead to inaccurate noises for a specific software engineering task. Compared to graphs, syntax trees are more precisely defined on the grammar and easier to parse; unfortunately, previous tree-based learning techniques have not been able to learn semantic information from trees to achieve better accuracy than graph-based techniques. We have proposed a new learning technique, named TreeCaps, by fusing together capsule networks with tree-based convolutional neural networks to achieve a learning accuracy higher than some existing graph-based techniques while it is based only on trees. TreeCaps introduces novel variable-to-static routing algorithms into the capsule networks to compensate for the loss of previous routing algorithms. Aside from accuracy, we also find that TreeCaps is the most robust to withstand those semantic-preserving program transformations that change code syntax without modifying the semantics. Evaluated on a large number of Java and C/C++ programs, TreeCaps models outperform prior deep learning models of program source code, in terms of both accuracy and robustness for program comprehension tasks such as code functionality classification and function name prediction. Our implementation is publicly available at: https://github.com/bdqnghi/treecaps.

#5 A Bottom-Up DAG Structure Extraction Model for Math Word Problems [PDF] [Copy] [Kimi]

Authors: Yixuan Cao ; Feng Hong ; Hongwei Li ; Ping Luo

Research on automatically solving mathematical word problems (MWP) has a long history. Most recent works adopt Seq2Seq approach to predict the result equations as a sequence of quantities and operators. Although result equations can be written as a sequence, it is essentially a structure. More precisely, it is a Direct Acyclic Graph (DAG) whose leaf nodes are the quantities, and internal and root nodes are arithmetic or comparison operators. In this paper, we propose a novel Seq2DAG approach to extract the equation set directly as a DAG structure. It is extracted in a bottom-up fashion by aggregating quantities and sub-expressions layer by layer iteratively. The advantages of our approach approach are three-fold: it is intrinsically suitable to solve multivariate problems, it always outputs valid structure, and its computation satisfies commutative law for +, x and =. Experimental results on Math23K and DRAW1K demonstrate that our model outperforms state-of-the-art deep learning methods. We also conduct detailed analysis on the results to show the strengths and limitations of our approach.

#6 Diagnose Like A Pathologist: Weakly-Supervised Pathologist-Tree Network for Slide-Level Immunohistochemical Scoring [PDF] [Copy] [Kimi]

Authors: Zhen Chen ; Jun Zhang ; Shuanlong Che ; Junzhou Huang ; Xiao Han ; Yixuan Yuan

The immunohistochemistry (IHC) test of biopsy tissue is crucial to develop targeted treatment and evaluate prognosis for cancer patients. The IHC staining slide is usually digitized into the whole-slide image (WSI) with gigapixels for quantitative image analysis. To perform a whole image prediction (e.g., IHC scoring, survival prediction, and cancer grading) from this kind of high-dimensional image, algorithms are often developed based on multi-instance learning (MIL) framework. However, the multi-scale information of WSI and the associations among instances are not well explored in existing MIL based studies. Inspired by the fact that pathologists jointly analyze visual fields at multiple powers of objective for diagnostic predictions, we propose a Pathologist-Tree Network (PTree-Net) to sparsely model the WSI efficiently in multi-scale manner. Specifically, we propose a Focal-Aware Module (FAM) that can approximately estimate diagnosis-related regions with an extractor trained using the thumbnail of WSI. With the initial diagnosis-related regions, we hierarchically model the multi-scale patches in a tree structure, where both the global and local information can be captured. To explore this tree structure in an end-to-end network, we propose a patch Relevance-enhanced Graph Convolutional Network (RGCN) to explicitly model the correlations of adjacent parent-child nodes, accompanied by patch relevance to exploit the implicit contextual information among distant nodes. In addition, tree-based self-supervision is devised to improve representation learning and suppress irrelevant instances adaptively. Extensive experiments are performed on a large-scale IHC HER2 dataset. The ablation study confirms the effectiveness of our design, and our approach outperforms state-of-the-art by a large margin.

#7 Modeling the Momentum Spillover Effect for Stock Prediction via Attribute-Driven Graph Attention Networks [PDF] [Copy] [Kimi]

Authors: Rui Cheng ; Qing Li

In finance, the momentum spillovers of listed firms is well acknowledged. Only few studies predicted the trend of one firm in terms of its relevant firms. A common strategy of the pilot work is to adopt graph convolution networks (GCNs) with some predefined firm relations. However, momentum spillovers are propagated via a variety of firm relations, of which the bridging importance varies with time. Restricting to several predefined relations inevitably makes noise and thus misleads stock predictions. In addition, traditional GCNs transfer and aggregate the peer influences without considering the states of both connected firms once a connection is built. Such non-attribute sensibility makes traditional GCNs inappropriate to deal with the attribute-sensitive momentum spillovers of listed firms wherein the abnormal price drop of one firm may not spill over if the trade volume of this decreasing price is small or the prices of the linked firms are undervalued. In this study, we propose an attribute-driven graph attention network (AD-GAT) to address both problems in modeling momentum spillovers. This is achieved by element-wisely multiplying the nonlinear transformation of the attributes of the connected firms with the attributes of the source firm to consider its attribute-sensitive momentum spillovers, and applying the unmasked attention mechanism to infer the general dynamic firm relation from observed market signals fused by a novel tensor-based feature extractor. Experiments on the three-year data of the S&P 500 demonstrate the superiority of the proposed framework over stateof-the-art algorithms, including GCN, eLSTM, and TGC.

#8 Differentially Private Link Prediction with Protected Connections [PDF] [Copy] [Kimi]

Authors: Abir De ; Soumen Chakrabarti

Link prediction (LP) algorithms propose to each node a ranked list of nodes that are currently non-neighbors, as the most likely candidates for future linkage. Owing to increasing concerns about privacy, users (nodes) may prefer to keep some of their connections protected or private. Motivated by this observation, our goal is to design a differentially private LP algorithm, which trades off between privacy of the protected node-pairs and the link prediction accuracy. More specifically, we first propose a form of differential privacy on graphs, which models the privacy loss only of those node-pairs which are marked as protected. Next, we develop DPLP, a learning to rank algorithm, which applies a monotone transform to base scores from a non-private LP system, and then adds noise. DPLP is trained with a privacy induced ranking loss, which optimizes the ranking utility for a given maximum allowed level of privacy leakage of the protected node-pairs. Under a recently introduced latent node embedding model, we present a formal trade-off between privacy and LP utility. Extensive experiments with several real-life graphs and several LP heuristics show that DPLP can trade off between privacy and predictive performance more effectively than several alternatives.

#9 Graph Neural Network to Dilute Outliers for Refactoring Monolith Application [PDF] [Copy] [Kimi]

Authors: Utkarsh Desai ; Sambaran Bandyopadhyay ; Srikanth Tamilselvam

Microservices are becoming the defacto design choice for software architecture. It involves partitioning the software components into finer modules such that the development can happen independently. It also provides natural benefits when deployed on the cloud since resources can be allocated dynamically to necessary components based on demand. Therefore, enterprises as part of their journey to cloud, are increasingly looking to refactor their monolith application into one or more candidate microservices; wherein each service contains a group of software entities (e.g., classes) that are responsible for a common functionality. Graphs are a natural choice to represent a software system. Each software entity can be represented as nodes and its dependencies with other entities as links. Therefore, this problem of refactoring can be viewed as a graph based clustering task. In this work, we propose a novel method to adapt the recent advancements in graph neural networks in the context of code to better understand the software and apply them in the clustering task. In that process, we also identify the outliers in the graph which can be directly mapped to top refactor candidates in the software. Our solution is able to improve state-of-the-art performance compared to works from both software engineering and existing graph representation based techniques.

#10 KAN: Knowledge-aware Attention Network for Fake News Detection [PDF] [Copy] [Kimi]

Authors: Yaqian Dun ; Kefei Tu ; Chen Chen ; Chunyan Hou ; Xiaojie Yuan

The explosive growth of fake news on social media has drawn great concern both from industrial and academic communities. There has been an increasing demand for fake news detection due to its detrimental effects. Generally, news content is condensed and full of knowledge entities. However, existing methods usually focus on the textual contents and social context, and ignore the knowledge-level relationships among news entities. To address this limitation, in this paper, we propose a novel Knowledge-aware Attention Network (KAN) that incorporates external knowledge from knowledge graph for fake news detection. Firstly, we identify entity mentions in news contents and align them with the entities in knowledge graph. Then, the entities and their contexts are used as external knowledge to provide complementary information. Finally, we design News towards Entities (N-E) attention and News towards Entities and Entity Contexts (N-E^2C) attention to measure the importances of knowledge. Thus, our proposed model can incorporate both semantic-level and knowledge-level representations of news to detect fake news. Experimental results on three public datasets show that our model outperforms the state-of-the-art methods, and also validate the effectiveness of knowledge attention.

#11 When Hashing Met Matching: Efficient Spatio-Temporal Search for Ridesharing [PDF] [Copy] [Kimi]

Author: Chinmoy Dutta

Shared on-demand mobility holds immense potential for urban transportation. However, finding ride matches in real-time at urban scale is a very difficult combinatorial optimization problem and mostly heuristic approaches are applied. In this work, we introduce a principled approach to this combinatorial problem. Our approach proceeds by constructing suitable representations for rides and driver routes capturing their essential spatio-temporal aspects in an appropriate vector space, and defining a similarity metric in this space that expresses matching utility. This then lets us mathematically model the problem of finding ride matches as that of Near Neighbor Search (NNS). Exploiting this modeling, we devise a novel spatio-temporal search algorithm for finding ride matches based on the theory of Locality Sensitive Hashing (LSH). Apart from being highly efficient, our algorithm enjoys several practically useful properties and extension possibilities. Experiments with large real-world datasets show that our algorithm consistently outperforms state-of-the-art heuristic methods thereby proving its practical applicability.

#12 Gene Regulatory Network Inference using 3D Convolutional Neural Network [PDF] [Copy] [Kimi]

Authors: Yue Fan ; Xiuli Ma

Gene regulatory networks (GRNs) consist of gene regulations between transcription factors (TFs) and their target genes. Single-cell RNA sequencing (scRNA-seq) brings both opportunities and challenges to the inference of GRNs. On the one hand, scRNA-seq data reveals statistic information of gene expressions at the single-cell resolution, which is conducive to the construction of GRNs; on the other hand, noises and dropouts pose great difficulties on the analysis of scRNA-seq data, causing low prediction accuracy by traditional methods. In this paper, we propose 3D Co-Expression Matrix Analysis (3DCEMA), which predicts regulatory relationships by classifying 3D co-expression matrices of gene triples using a 3D convolutional neural network. We found that by introducing a third gene as a comparison factor, our method can avoid the disturbance of noises and dropouts, and significantly increase the prediction accuracy of regulations between gene pairs. Compared with other existing GRN inference algorithms on both in-silico datasets and scRNA-Seq datasets, our algorithm based on deep learning shows higher stability and accuracy in the task of GRN inference.

#13 Universal Trading for Order Execution with Oracle Policy Distillation [PDF] [Copy] [Kimi]

Authors: Yuchen Fang ; Kan Ren ; Weiqing Liu ; Dong Zhou ; Weinan Zhang ; Jiang Bian ; Yong Yu ; Tie-Yan Liu

As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Towards effective execution strategy, recent years have witnessed the shift from the analytical view with model-based market assumptions to model-free perspective, i.e., reinforcement learning, due to its nature of sequential decision optimization. However, the noisy and yet imperfect market information that can be leveraged by the policy has made it quite challenging to build up sample efficient reinforcement learning methods to achieve effective order execution. In this paper, we propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. Particularly, this framework leverages a policy distillation method that can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information to approximate the optimal trading strategy. The extensive experiments have shown significant improvements of our method over various strong baselines, with reasonable trading actions.

#14 Dual-Octave Convolution for Accelerated Parallel MR Image Reconstruction [PDF] [Copy] [Kimi]

Authors: Chun-Mei Feng ; Zhanyuan Yang ; Geng Chen ; Yong Xu ; Ling Shao

Magnetic resonance (MR) image acquisition is an inherently prolonged process, whose acceleration by obtaining multiple undersampled images simultaneously through parallel imaging has always been the subject of research. In this paper, we propose the Dual-Octave Convolution (Dual-OctConv), which is capable of learning multi-scale spatial-frequency features from both real and imaginary components, for fast parallel MR image reconstruction. By reformulating the complex operations using octave convolutions, our model shows a strong ability to capture richer representations of MR images, while at the same time greatly reducing the spatial redundancy. More specifically, the input feature maps and convolutional kernels are first split into two components (i.e., real and imaginary), which are then divided into four groups according to their spatial frequencies. Then, our Dual-OctConv conducts intra-group information updating and inter-group information exchange to aggregate the contextual information across different groups. Our framework provides two appealing benefits: (i) it encourages interactions between real and imaginary components at various spatial frequencies to achieve richer representational capacity, and (ii) it enlarges the receptive field by learning multiple spatial-frequency features of both the real and imaginary components. We evaluate the performance of the proposed model on the acceleration of multi-coil MR image reconstruction. Extensive experiments are conducted on an {in vivo} knee dataset under different undersampling patterns and acceleration factors. The experimental results demonstrate the superiority of our model in accelerated parallel MR image reconstruction. Our code is available at: github.com/chunmeifeng/Dual-OctConv.

#15 MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [PDF] [Copy] [Kimi]

Authors: Tianfan Fu ; Cao Xiao ; Xinhao Li ; Lucas M. Glass ; Jimeng Sun

Molecule optimization is a fundamental task for accelerating drug discovery, with the goal of generating new valid molecules that maximize multiple drug properties while maintaining similarity to the input molecule. Existing generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. To address such challenges, we propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution. MIMOSA first pretrains two property agnostic graph neural networks (GNNs) for molecule topology and substructure-type prediction, where a substructure can be either atom or single ring. For each iteration, MIMOSA uses the GNNs’ prediction and employs three basic substructure operations (add, replace, delete) to generate new molecules and associated weights. The weights can encode multiple constraints including similarity and drug property constraints, upon which we select promising molecules for next iteration. MIMOSA enables flexible encoding of multiple property- and similarity-constraints and can efficiently generate new molecules that satisfy various property constraints and achieved up to 49.1% relative improvement over the best baseline in terms of success rate.

#16 ECG ODE-GAN: Learning Ordinary Differential Equations of ECG Dynamics via Generative Adversarial Learning [PDF] [Copy] [Kimi]

Authors: Tomer Golany ; Daniel Freedman ; Kira Radinsky

Understanding the dynamics of complex biological and physiological systems has been explored for many years in the form of physically-based mathematical simulators. The behavior of a physical system is often described via ordinary differential equations (ODE), referred to as the dynamics. In the standard case, the dynamics are derived from purely physical considerations. By contrast, in this work we study how the dynamics can be learned by a generative adversarial network which combines both physical and data considerations. As a use case, we focus on the dynamics of the heart signal electrocardiogram (ECG). We begin by introducing a new GAN framework, dubbed ODE-GAN, in which the generator learns the dynamics of a physical system in the form of an ordinary differential equation. Specifically, the generator network receives as input a value at a specific time step, and produces the derivative of the system at that time step. Thus, the ODE-GAN learns purely data-driven dynamics. We then show how to incorporate physical considerations into ODE-GAN. We achieve this through the introduction of an additional input to the ODE-GAN generator: physical parameters, which partially characterize the signal of interest. As we focus on ECG signals, we refer to this new framework as ECG-ODE-GAN. We perform an empirical evaluation and show that generating ECG heartbeats from our learned dynamics improves ECG heartbeat classification.

#17 Towered Actor Critic For Handling Multiple Action Types In Reinforcement Learning For Drug Discovery [PDF] [Copy] [Kimi]

Authors: Sai Krishna Gottipati ; Yashaswi Pathak ; Boris Sattarov ; Sahir ; Rohan Nuttall ; Mohammad Amini ; Matthew E. Taylor ; Sarath Chandar

Reinforcement learning (RL) has made significant progress in both abstract and real-world domains, but the majority of state-of-the-art algorithms deal only with monotonic actions. However, some applications require agents to reason over different types of actions. Our application simulates reaction-based molecule generation, used as part of the drug discovery pipeline, and includes both uni-molecular and bi-molecular reactions. This paper introduces a novel framework, towered actor critic (TAC), to handle multiple action types. The TAC framework is general in that it is designed to be combined with any existing RL algorithms for continuous action space. We combine it with TD3 to empirically obtain significantly better results than existing methods in the drug discovery setting. TAC is also applied to RL benchmarks in OpenAI Gym and results show that our framework can improve, or at least does not hurt, performance relative to standard TD3.

#18 Hierarchical Graph Convolution Network for Traffic Forecasting [PDF] [Copy] [Kimi]

Authors: Kan Guo ; Yongli Hu ; Yanfeng Sun ; Sean Qian ; Junbin Gao ; Baocai Yin

Traffic forecasting is attracting considerable interest due to its widespread application in intelligent transportation systems. Given the complex and dynamic traffic data, many methods focus on how to establish a spatial-temporal model to express the non-stationary traffic patterns. Recently, the latest Graph Convolution Network (GCN) has been introduced to learn spatial features while the time neural networks are used to learn temporal features. These GCN based methods obtain state-of-the-art performance. However, the current GCN based methods ignore the natural hierarchical structure of traffic systems which is composed of the micro layers of road networks and the macro layers of region networks, in which the nodes are obtained through pooling method and could include some hot traffic regions such as downtown and CBD etc., while the current GCN is only applied on the micro graph of road networks. In this paper, we propose a novel Hierarchical Graph Convolution Networks (HGCN) for traffic forecasting by operating on both the micro and macro traffic graphs. The proposed method is evaluated on two complex city traffic speed datasets. Compared to the latest GCN based methods like Graph WaveNet, the proposed HGCN gets higher traffic forecasting precision with lower computational cost.The website of the code is https://github.com/guokan987/HGCN.git.

#19 Automated Lay Language Summarization of Biomedical Scientific Reviews [PDF] [Copy] [Kimi]

Authors: Yue Guo ; Wei Qiu ; Yizhong Wang ; Trevor Cohen

Health literacy has emerged as a crucial factor in making appropriate health decisions and ensuring treatment outcomes. However, medical jargon and the complex structure of professional language in this domain make health information especially hard to interpret. Thus, there is an urgent unmet need for automated methods to enhance the accessibility of the biomedical literature to the general population. This problem can be framed as a type of translation problem between the language of healthcare professionals, and that of the general public. In this paper, we introduce the novel task of automated generation of lay language summaries of biomedical scientific reviews, and construct a dataset to support the development and evaluation of automated methods through which to enhance the accessibility of the biomedical literature. We conduct analyses of the various challenges in performing this task, including not only summarization of the key points but also explanation of background knowledge and simplification of professional language. We experiment with state-of-the-art summarization models as well as several data augmentation techniques, and evaluate their performance using both automated metrics and human assessment. Results indicate that automatically generated summaries produced using contemporary neural architectures can achieve promising quality and readability as compared with reference summaries developed for the lay public by experts (best ROUGE-L of 50.24 and Flesch-Kincaid readability score of 13.30). We also discuss the limitations of the current effort, providing insights and directions for future work.

#20 Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances [PDF] [Copy] [Kimi]

Authors: Sijie He ; Xinyan Li ; Timothy DelSole ; Pradeep Ravikumar ; Arindam Banerjee

Sub-seasonal forecasting (SSF) focuses on predicting key variables such as temperature and precipitation on the 2-week to 2-month time scale. Skillful SSF would have immense societal value in such areas as agricultural productivity, water resource management, and emergency planning for extreme weather events. However, SSF is considered more challenging than either weather prediction or even seasonal prediction, and is still a largely understudied problem. In this paper, we carefully investigate 10 Machine Learning (ML) approaches to sub-seasonal temperature forecasting over the contiguous U.S. on the SSF dataset we collect, including a variety of climate variables from the atmosphere, ocean, and land. Because of the complicated atmosphere-land-ocean couplings and the limited amount of good quality observational data, SSF imposes a great challenge for ML despite the recent advances in various domains. Our results indicate that suitable ML models, e.g., XGBoost, to some extent, capture the predictability on sub-seasonal time scales and can outperform the climatological baselines, while Deep Learning (DL) models barely manage to match the best results with carefully designed architecture. Besides, our analysis and exploration provide insights on important aspects to improve the quality of sub-seasonal forecasts, e.g., feature representation and model architecture. The SSF dataset and code are released with this paper for use by the broader research community.

#21 Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs [PDF] [Copy] [Kimi]

Authors: Wen-Yi Hsiao ; Jen-Yu Liu ; Yin-Cheng Yeh ; Yi-Hsuan Yang

To apply neural sequence models such as the Transformers to music generation tasks, one has to represent a piece of music by a sequence of tokens drawn from a finite set of pre-defined vocabulary. Such a vocabulary usually involves tokens of various types. For example, to describe a musical note, one needs separate tokens to indicate the note’s pitch, duration, velocity (dynamics), and placement (onset time) along the time grid. While different types of tokens may possess different properties, existing models usually treat them equally, in the same way as modeling words in natural languages. In this paper, we present a conceptually different approach that explicitly takes into account the type of the tokens, such as note types and metric types. And, we propose a new Transformer decoder architecture that uses different feed-forward heads to model tokens of different types. With an expansion-compression trick, we convert a piece of music to a sequence of compound words by grouping neighboring tokens, greatly reducing the length of the token sequences. We show that the resulting model can be viewed as a learner over dynamic directed hypergraphs. And, we employ it to learn to compose expressive Pop piano music of full-song length (involving up to 10K individual tokens per song), both conditionally and unconditionally. Our experiment shows that, compared to state-of-the-art models, the proposed model converges 5 to 10 times faster at training (i.e., within a day on a single GPU with 11 GB memory), and with comparable quality in the generated music

#22 Modeling the Compatibility of Stem Tracks to Generate Music Mashups [PDF] [Copy] [Kimi]

Authors: Jiawen Huang ; Ju-Chiang Wang ; Jordan B. L. Smith ; Xuchen Song ; Yuxuan Wang

A music mashup combines audio elements from two or more songs to create a new work. To reduce the time and effort required to make them, researchers have developed algorithms that predict the compatibility of audio elements. Prior work has focused on mixing unaltered excerpts, but advances in source separation enable the creation of mashups from isolated stems (e.g., vocals, drums, bass, etc.). In this work, we take advantage of separated stems not just for creating mashups, but for training a model that predicts the mutual compatibility of groups of excerpts, using self-supervised and semi-supervised methods. Specifically, we first produce a random mashup creation pipeline that combines stem tracks obtained via source separation, with key and tempo automatically adjusted to match, since these are prerequisites for high-quality mashups. To train a model to predict compatibility, we use stem tracks obtained from the same song as positive examples, and random combinations of stems with key and/or tempo unadjusted as negative examples. To improve the model and use more data, we also train on "average" examples: random combinations with matching key and tempo, where we treat them as unlabeled data as their true compatibility is unknown. To determine whether the combined signal or the set of stem signals is more indicative of the quality of the result, we experiment on two model architectures and train them using semi-supervised learning technique. Finally, we conduct objective and subjective evaluations of the system, comparing them to a standard rule-based system.

#23 SDGNN: Learning Node Representation for Signed Directed Networks [PDF] [Copy] [Kimi]

Authors: Junjie Huang ; Huawei Shen ; Liang Hou ; Xueqi Cheng

Network embedding is aimed at mapping nodes in a network into low-dimensional vector representations. Graph Neural Networks (GNNs) have received widespread attention and lead to state-of-the-art performance in learning node representations. However, most GNNs only work in unsigned networks, where only positive links exist. It is not trivial to transfer these models to signed directed networks, which are widely observed in the real world yet less studied. In this paper, we first review two fundamental sociological theories (i.e., status theory and balance theory) and conduct empirical studies on real-world datasets to analyze the social mechanism in signed directed networks. Guided by related socio- logical theories, we propose a novel Signed Directed Graph Neural Networks model named SDGNN to learn node embeddings for signed directed networks. The proposed model simultaneously reconstructs link signs, link directions, and signed directed triangles. We validate our model’s effectiveness on five real-world datasets, which are commonly used as the benchmark for signed network embeddings. Experiments demonstrate the proposed model outperforms existing models, including feature-based methods, network embedding methods, and several GNN methods.

#24 The Causal Learning of Retail Delinquency [PDF] [Copy] [Kimi]

Authors: Yiyan Huang ; Cheuk Hang Leung ; Xing Yan ; Qi Wu ; Nanbo Peng ; Dongdong Wang ; Zhixiang Huang

This paper focuses on the expected difference in borrower's repayment when there is a change in the lender's credit decisions. Classical estimators overlook the confounding effects and hence the estimation error can be magnificent. As such, we propose another approach to construct the estimators such that the error can be greatly reduced. The proposed estimators are shown to be unbiased, consistent, and robust through a combination of theoretical analysis and numerical testing. Moreover, we compare the power of estimating the causal quantities between the classical estimators and the proposed estimators. The comparison is tested across a wide range of models, including linear regression models, tree-based models, and neural network-based models, under different simulated datasets that exhibit different levels of causality, different degrees of nonlinearity, and different distributional properties. Most importantly, we apply our approaches to a large observational dataset provided by a global technology firm that operates in both the e-commerce and the lending business. We find that the relative reduction of estimation error is strikingly substantial if the causal effects are accounted for correctly.

#25 Deep Portfolio Optimization via Distributional Prediction of Residual Factors [PDF] [Copy] [Kimi]

Authors: Kentaro Imajo ; Kentaro Minami ; Katsuya Ito ; Kei Nakagawa

Recent developments in deep learning techniques have motivated intensive research in machine learning-aided stock trading strategies. However, since the financial market has a highly non-stationary nature hindering the application of typical data-hungry machine learning methods, leveraging financial inductive biases is important to ensure better sample efficiency and robustness. In this study, we propose a novel method of constructing a portfolio based on predicting the distribution of a financial quantity called residual factors, which is known to be generally useful for hedging the risk exposure to common market factors. The key technical ingredients are twofold. First, we introduce a computationally efficient extraction method for the residual information, which can be easily combined with various prediction algorithms. Second, we propose a novel neural network architecture that allows us to incorporate widely acknowledged financial inductive biases such as amplitude invariance and time-scale invariance. We demonstrate the efficacy of our method on U.S. and Japanese stock market data. Through ablation experiments, we also verify that each individual technique contributes to improving the performance of trading strategies. We anticipate our techniques may have wide applications in various financial problems.